Overview

Dataset statistics

Number of variables14
Number of observations10000
Missing cells8338
Missing cells (%)6.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.1 MiB
Average record size in memory112.0 B

Variable types

Numeric10
Categorical4

Warnings

product_name has a high cardinality: 9268 distinct values High cardinality
brands has a high cardinality: 5586 distinct values High cardinality
ingredients_text has a high cardinality: 8858 distinct values High cardinality
df_index is highly correlated with Unnamed: 0High correlation
Unnamed: 0 is highly correlated with df_indexHigh correlation
product_name has 104 (1.0%) missing values Missing
brands has 206 (2.1%) missing values Missing
ingredients_text has 885 (8.8%) missing values Missing
nutrition_grade_fr has 1416 (14.2%) missing values Missing
fat_100g has 560 (5.6%) missing values Missing
saturated-fat_100g has 1095 (10.9%) missing values Missing
carbohydrates_100g has 567 (5.7%) missing values Missing
sugars_100g has 583 (5.8%) missing values Missing
fiber_100g has 2551 (25.5%) missing values Missing
salt_100g has 228 (2.3%) missing values Missing
product_name is uniformly distributed Uniform
ingredients_text is uniformly distributed Uniform
df_index has unique values Unique
Unnamed: 0 has unique values Unique
energy_100g has 308 (3.1%) zeros Zeros
fat_100g has 2285 (22.9%) zeros Zeros
saturated-fat_100g has 2505 (25.1%) zeros Zeros
carbohydrates_100g has 733 (7.3%) zeros Zeros
sugars_100g has 1342 (13.4%) zeros Zeros
fiber_100g has 2487 (24.9%) zeros Zeros
proteins_100g has 1895 (18.9%) zeros Zeros
salt_100g has 1282 (12.8%) zeros Zeros

Reproduction

Analysis started2021-03-23 07:34:41.961855
Analysis finished2021-03-23 07:35:00.541425
Duration18.58 seconds
Software versionpandas-profiling v2.12.0
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean146760.3517
Minimum6
Maximum296434
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2021-03-23T08:35:00.690217image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum6
5-th percentile14921.65
Q172413.25
median146395.5
Q3219896.5
95-th percentile282103.05
Maximum296434
Range296428
Interquartile range (IQR)147483.25

Descriptive statistics

Standard deviation85676.0385
Coefficient of variation (CV)0.5837819105
Kurtosis-1.202221965
Mean146760.3517
Median Absolute Deviation (MAD)73758.5
Skewness0.02868457431
Sum1467603517
Variance7340383572
MonotocityNot monotonic
2021-03-23T08:35:00.881864image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1310721
 
< 0.1%
2178311
 
< 0.1%
785551
 
< 0.1%
519321
 
< 0.1%
2301091
 
< 0.1%
2198701
 
< 0.1%
1252021
 
< 0.1%
150741
 
< 0.1%
1983861
 
< 0.1%
1092881
 
< 0.1%
Other values (9990)9990
99.9%
ValueCountFrequency (%)
61
< 0.1%
201
< 0.1%
241
< 0.1%
701
< 0.1%
1371
< 0.1%
ValueCountFrequency (%)
2964341
< 0.1%
2963911
< 0.1%
2963831
< 0.1%
2963601
< 0.1%
2963431
< 0.1%

Unnamed: 0
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean160841.9487
Minimum7
Maximum355884
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2021-03-23T08:35:01.065734image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum7
5-th percentile15438.65
Q174826.25
median151847.5
Q3242296.5
95-th percentile331935.05
Maximum355884
Range355877
Interquartile range (IQR)167470.25

Descriptive statistics

Standard deviation100420.8039
Coefficient of variation (CV)0.6243446112
Kurtosis-1.105132484
Mean160841.9487
Median Absolute Deviation (MAD)82982.5
Skewness0.2208738151
Sum1608419487
Variance1.008433786 × 1010
MonotocityNot monotonic
2021-03-23T08:35:01.248411image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
860181
 
< 0.1%
48311
 
< 0.1%
375911
 
< 0.1%
2239601
 
< 0.1%
3155261
 
< 0.1%
1297541
 
< 0.1%
1699451
 
< 0.1%
335011
 
< 0.1%
2854061
 
< 0.1%
928961
 
< 0.1%
Other values (9990)9990
99.9%
ValueCountFrequency (%)
71
< 0.1%
211
< 0.1%
251
< 0.1%
751
< 0.1%
1441
< 0.1%
ValueCountFrequency (%)
3558841
< 0.1%
3558121
< 0.1%
3558031
< 0.1%
3557711
< 0.1%
3557251
< 0.1%

product_name
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Distinct9268
Distinct (%)93.7%
Missing104
Missing (%)1.0%
Memory size78.2 KiB
Ice Cream
 
16
Extra Virgin Olive Oil
 
11
Cookies
 
9
Potato Chips
 
7
Coconut Water
 
7
Other values (9263)
9846 

Length

Max length161
Median length24
Mean length26.63156831
Min length3

Characters and Unicode

Total characters263546
Distinct characters234
Distinct categories16 ?
Distinct scripts5 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8864 ?
Unique (%)89.6%

Sample

1st rowPremium Ice Cream
2nd rowKohler, Strawberry Balsamic Rare Facets Chocolate
3rd rowCenter Cut Chops Boneless Thin Pork
4th rowCorn Flakes Glacés au sucre
5th rowHomekist, Sandwich Cookies, Lemon Creme
ValueCountFrequency (%)
Ice Cream16
 
0.2%
Extra Virgin Olive Oil11
 
0.1%
Cookies9
 
0.1%
Potato Chips7
 
0.1%
Coconut Water7
 
0.1%
Juice6
 
0.1%
Tomato Sauce6
 
0.1%
Spaghetti6
 
0.1%
Trail Mix6
 
0.1%
Syrup6
 
0.1%
Other values (9258)9816
98.2%
(Missing)104
 
1.0%
2021-03-23T08:35:01.662033image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
de958
 
2.4%
881
 
2.2%
chocolate405
 
1.0%
sauce351
 
0.9%
cheese347
 
0.9%
organic309
 
0.8%
au302
 
0.7%
with287
 
0.7%
mix263
 
0.6%
à243
 
0.6%
Other values (7772)36261
89.3%

Most occurring characters

ValueCountFrequency (%)
30812
 
11.7%
e26038
 
9.9%
a20024
 
7.6%
r15420
 
5.9%
i15153
 
5.7%
o13556
 
5.1%
t12219
 
4.6%
n11827
 
4.5%
s11350
 
4.3%
l10768
 
4.1%
Other values (224)96379
36.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter191159
72.5%
Uppercase Letter34235
 
13.0%
Space Separator30813
 
11.7%
Other Punctuation4862
 
1.8%
Decimal Number1555
 
0.6%
Dash Punctuation498
 
0.2%
Open Punctuation174
 
0.1%
Close Punctuation170
 
0.1%
Math Symbol45
 
< 0.1%
Other Letter11
 
< 0.1%
Other values (6)24
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
e26038
13.6%
a20024
10.5%
r15420
 
8.1%
i15153
 
7.9%
o13556
 
7.1%
t12219
 
6.4%
n11827
 
6.2%
s11350
 
5.9%
l10768
 
5.6%
u8436
 
4.4%
Other values (98)46368
24.3%
ValueCountFrequency (%)
C5311
15.5%
S4127
12.1%
P2971
 
8.7%
B2745
 
8.0%
M2165
 
6.3%
F1731
 
5.1%
T1475
 
4.3%
G1455
 
4.3%
A1271
 
3.7%
O1271
 
3.7%
Other values (56)9713
28.4%
ValueCountFrequency (%)
,3033
62.4%
&710
 
14.6%
'537
 
11.0%
%229
 
4.7%
.147
 
3.0%
;79
 
1.6%
/49
 
1.0%
!38
 
0.8%
:21
 
0.4%
*5
 
0.1%
Other values (7)14
 
0.3%
ValueCountFrequency (%)
0440
28.3%
1259
16.7%
2232
14.9%
5145
 
9.3%
4130
 
8.4%
3115
 
7.4%
694
 
6.0%
872
 
4.6%
740
 
2.6%
928
 
1.8%
ValueCountFrequency (%)
2
18.2%
2
18.2%
1
9.1%
1
9.1%
1
9.1%
1
9.1%
1
9.1%
1
9.1%
1
9.1%
ValueCountFrequency (%)
(169
97.1%
[2
 
1.1%
{2
 
1.1%
1
 
0.6%
ValueCountFrequency (%)
2
33.3%
2
33.3%
1
16.7%
1
16.7%
ValueCountFrequency (%)
+41
91.1%
<2
 
4.4%
>2
 
4.4%
ValueCountFrequency (%)
®3
37.5%
°3
37.5%
2
25.0%
ValueCountFrequency (%)
30812
> 99.9%
 1
 
< 0.1%
ValueCountFrequency (%)
)168
98.8%
]2
 
1.2%
ValueCountFrequency (%)
«2
66.7%
1
33.3%
ValueCountFrequency (%)
-498
100.0%
ValueCountFrequency (%)
»2
100.0%
ValueCountFrequency (%)
$4
100.0%
ValueCountFrequency (%)
´1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin224487
85.2%
Common38135
 
14.5%
Cyrillic813
 
0.3%
Greek94
 
< 0.1%
Thai17
 
< 0.1%

Most frequent character per script

ValueCountFrequency (%)
e26038
 
11.6%
a20024
 
8.9%
r15420
 
6.9%
i15153
 
6.8%
o13556
 
6.0%
t12219
 
5.4%
n11827
 
5.3%
s11350
 
5.1%
l10768
 
4.8%
u8436
 
3.8%
Other values (77)79696
35.5%
ValueCountFrequency (%)
о95
 
11.7%
а65
 
8.0%
и64
 
7.9%
р59
 
7.3%
н55
 
6.8%
е54
 
6.6%
к43
 
5.3%
с42
 
5.2%
л40
 
4.9%
т27
 
3.3%
Other values (38)269
33.1%
ValueCountFrequency (%)
30812
80.8%
,3033
 
8.0%
&710
 
1.9%
'537
 
1.4%
-498
 
1.3%
0440
 
1.2%
1259
 
0.7%
2232
 
0.6%
%229
 
0.6%
(169
 
0.4%
Other values (37)1216
 
3.2%
ValueCountFrequency (%)
ο7
 
7.4%
α5
 
5.3%
ς5
 
5.3%
μ5
 
5.3%
Ο5
 
5.3%
ρ4
 
4.3%
ν4
 
4.3%
Κ3
 
3.2%
ε3
 
3.2%
λ3
 
3.2%
Other values (29)50
53.2%
ValueCountFrequency (%)
2
11.8%
2
11.8%
2
11.8%
2
11.8%
1
 
5.9%
1
 
5.9%
1
 
5.9%
1
 
5.9%
1
 
5.9%
1
 
5.9%
Other values (3)3
17.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII260225
98.7%
None2485
 
0.9%
Cyrillic813
 
0.3%
Thai17
 
< 0.1%
Punctuation4
 
< 0.1%
Letterlike Symbols2
 
< 0.1%

Most frequent character per block

ValueCountFrequency (%)
30812
 
11.8%
e26038
 
10.0%
a20024
 
7.7%
r15420
 
5.9%
i15153
 
5.8%
o13556
 
5.2%
t12219
 
4.7%
n11827
 
4.5%
s11350
 
4.4%
l10768
 
4.1%
Other values (78)93058
35.8%
ValueCountFrequency (%)
é1351
54.4%
à241
 
9.7%
è201
 
8.1%
â109
 
4.4%
ê67
 
2.7%
ü48
 
1.9%
ä46
 
1.9%
û43
 
1.7%
ô35
 
1.4%
É35
 
1.4%
Other values (70)309
 
12.4%
ValueCountFrequency (%)
о95
 
11.7%
а65
 
8.0%
и64
 
7.9%
р59
 
7.3%
н55
 
6.8%
е54
 
6.6%
к43
 
5.3%
с42
 
5.2%
л40
 
4.9%
т27
 
3.3%
Other values (38)269
33.1%
ValueCountFrequency (%)
2
100.0%
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%
ValueCountFrequency (%)
2
11.8%
2
11.8%
2
11.8%
2
11.8%
1
 
5.9%
1
 
5.9%
1
 
5.9%
1
 
5.9%
1
 
5.9%
1
 
5.9%
Other values (3)3
17.6%

brands
Categorical

HIGH CARDINALITY
MISSING

Distinct5586
Distinct (%)57.0%
Missing206
Missing (%)2.1%
Memory size78.2 KiB
Auchan
 
107
Carrefour
 
104
U
 
81
Leader Price
 
72
Casino
 
65
Other values (5581)
9365 

Length

Max length106
Median length12
Mean length15.12824178
Min length1

Characters and Unicode

Total characters148166
Distinct characters182
Distinct categories16 ?
Distinct scripts5 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4359 ?
Unique (%)44.5%

Sample

1st rowCasper's Ice Cream Inc.
2nd rowKohler Original Recipe Chocolates
3rd rowTarget Stores
4th rowCrownfield
5th rowVista Bakery Inc.
ValueCountFrequency (%)
Auchan107
 
1.1%
Carrefour104
 
1.0%
U81
 
0.8%
Leader Price72
 
0.7%
Casino65
 
0.7%
Meijer64
 
0.6%
Kroger54
 
0.5%
Spartan44
 
0.4%
Roundy's41
 
0.4%
Great Value39
 
0.4%
Other values (5576)9123
91.2%
(Missing)206
 
2.1%
2021-03-23T08:35:02.074220image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
inc1259
 
5.6%
foods494
 
2.2%
312
 
1.4%
company311
 
1.4%
llc283
 
1.3%
food256
 
1.1%
co253
 
1.1%
the178
 
0.8%
market143
 
0.6%
carrefour141
 
0.6%
Other values (5850)18693
83.7%

Most occurring characters

ValueCountFrequency (%)
14709
 
9.9%
e12563
 
8.5%
a10687
 
7.2%
r9788
 
6.6%
o9466
 
6.4%
n8239
 
5.6%
i7613
 
5.1%
s6470
 
4.4%
t5645
 
3.8%
l5486
 
3.7%
Other values (172)57500
38.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter104204
70.3%
Uppercase Letter23051
 
15.6%
Space Separator14709
 
9.9%
Other Punctuation5490
 
3.7%
Dash Punctuation354
 
0.2%
Decimal Number252
 
0.2%
Open Punctuation39
 
< 0.1%
Close Punctuation39
 
< 0.1%
Math Symbol8
 
< 0.1%
Other Letter8
 
< 0.1%
Other values (6)12
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
e12563
12.1%
a10687
10.3%
r9788
9.4%
o9466
 
9.1%
n8239
 
7.9%
i7613
 
7.3%
s6470
 
6.2%
t5645
 
5.4%
l5486
 
5.3%
c4499
 
4.3%
Other values (77)23748
22.8%
ValueCountFrequency (%)
C2415
 
10.5%
S1976
 
8.6%
F1751
 
7.6%
I1655
 
7.2%
M1631
 
7.1%
B1501
 
6.5%
L1410
 
6.1%
P1187
 
5.1%
A1026
 
4.5%
T969
 
4.2%
Other values (45)7530
32.7%
ValueCountFrequency (%)
.1909
34.8%
,1865
34.0%
'970
17.7%
&326
 
5.9%
/305
 
5.6%
:73
 
1.3%
!34
 
0.6%
"4
 
0.1%
%3
 
0.1%
@1
 
< 0.1%
ValueCountFrequency (%)
346
18.3%
544
17.5%
639
15.5%
130
11.9%
025
9.9%
220
7.9%
718
 
7.1%
413
 
5.2%
810
 
4.0%
97
 
2.8%
ValueCountFrequency (%)
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
ValueCountFrequency (%)
-353
99.7%
1
 
0.3%
ValueCountFrequency (%)
14709
100.0%
ValueCountFrequency (%)
1
100.0%
ValueCountFrequency (%)
(39
100.0%
ValueCountFrequency (%)
)39
100.0%
ValueCountFrequency (%)
+8
100.0%
ValueCountFrequency (%)
$4
100.0%
ValueCountFrequency (%)
2
100.0%
ValueCountFrequency (%)
`1
100.0%
ValueCountFrequency (%)
«2
100.0%
ValueCountFrequency (%)
»2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin126920
85.7%
Common20901
 
14.1%
Cyrillic324
 
0.2%
Greek11
 
< 0.1%
Thai10
 
< 0.1%

Most frequent character per script

ValueCountFrequency (%)
e12563
 
9.9%
a10687
 
8.4%
r9788
 
7.7%
o9466
 
7.5%
n8239
 
6.5%
i7613
 
6.0%
s6470
 
5.1%
t5645
 
4.4%
l5486
 
4.3%
c4499
 
3.5%
Other values (71)46464
36.6%
ValueCountFrequency (%)
а27
 
8.3%
о27
 
8.3%
е22
 
6.8%
р21
 
6.5%
н19
 
5.9%
и18
 
5.6%
с16
 
4.9%
к14
 
4.3%
л13
 
4.0%
т12
 
3.7%
Other values (40)135
41.7%
ValueCountFrequency (%)
14709
70.4%
.1909
 
9.1%
,1865
 
8.9%
'970
 
4.6%
-353
 
1.7%
&326
 
1.6%
/305
 
1.5%
:73
 
0.3%
346
 
0.2%
544
 
0.2%
Other values (21)301
 
1.4%
ValueCountFrequency (%)
Δ1
9.1%
ω1
9.1%
δ1
9.1%
ώ1
9.1%
ν1
9.1%
η1
9.1%
Ε1
9.1%
λ1
9.1%
α1
9.1%
ϊ1
9.1%
ValueCountFrequency (%)
2
20.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII147196
99.3%
None634
 
0.4%
Cyrillic324
 
0.2%
Thai10
 
< 0.1%
Letterlike Symbols1
 
< 0.1%
Punctuation1
 
< 0.1%

Most frequent character per block

ValueCountFrequency (%)
14709
 
10.0%
e12563
 
8.5%
a10687
 
7.3%
r9788
 
6.6%
o9466
 
6.4%
n8239
 
5.6%
i7613
 
5.2%
s6470
 
4.4%
t5645
 
3.8%
l5486
 
3.7%
Other values (69)56530
38.4%
ValueCountFrequency (%)
é349
55.0%
è122
 
19.2%
ô22
 
3.5%
ü21
 
3.3%
ó18
 
2.8%
ê12
 
1.9%
É9
 
1.4%
î8
 
1.3%
ä8
 
1.3%
í7
 
1.1%
Other values (32)58
 
9.1%
ValueCountFrequency (%)
а27
 
8.3%
о27
 
8.3%
е22
 
6.8%
р21
 
6.5%
н19
 
5.9%
и18
 
5.6%
с16
 
4.9%
к14
 
4.3%
л13
 
4.0%
т12
 
3.7%
Other values (40)135
41.7%
ValueCountFrequency (%)
1
100.0%
ValueCountFrequency (%)
2
20.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
ValueCountFrequency (%)
1
100.0%

ingredients_text
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Distinct8858
Distinct (%)97.2%
Missing885
Missing (%)8.8%
Memory size78.2 KiB
Extra virgin olive oil
 
8
Pecans.
 
7
Semoule de _blé_ dur de qualité supérieure.
 
7
Semolina (wheat), durum flour (wheat), niacin, ferrous sulfate (iron), thiamin mononitrate, riboflavin, folic acid.
 
7
Extra virgin olive oil.
 
7
Other values (8853)
9079 

Length

Max length4468
Median length183
Mean length215.7068568
Min length1

Characters and Unicode

Total characters1966168
Distinct characters275
Distinct categories20 ?
Distinct scripts5 ?
Distinct blocks7 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8696 ?
Unique (%)95.4%

Sample

1st rowFresh whole milk, sugar, cream, mint flake (corn syrup, sugar, partially hydrogenated soybean oil, red lake 40, green 3, red 40, peppermint oil, soy lecithin), corn syrup solids, non-fat dry milk, stabilizer (microcrystalline cellulose, cellulose gum, mon
2nd row61% chocolate (cocoa beans, pure cane sugar, cocoa butter, soy lecithin, vanilla bean), milk chocolate (pure cane sugar, full cream milk, cocoa butter, cocoa beans, soy lecithin, vanilla bean), strawberry puree (strawberry 87%, invert sugar syrup), 55% ch
3rd rowPork, *solution ingredients: water, potassium lactate, sodium phosphates, salt, diacetate.
4th rowMaïs 77% Sucre 28% Extrait de Malte d'orge Sel
5th rowEnriched flour (wheat flour, niacin, reduced iron, thiamine mononitrate, riboflavin, folic acid), sugar, vegetable oil (contains one or more of the following: canola oil, corn oil, palm oil, soybean oil), dextrose, high fructose corn syrup, corn syrup, co
ValueCountFrequency (%)
Extra virgin olive oil8
 
0.1%
Pecans.7
 
0.1%
Semoule de _blé_ dur de qualité supérieure.7
 
0.1%
Semolina (wheat), durum flour (wheat), niacin, ferrous sulfate (iron), thiamin mononitrate, riboflavin, folic acid.7
 
0.1%
Extra virgin olive oil.7
 
0.1%
Almonds.7
 
0.1%
Durum semolina, niacin, ferrous sulfate (iron), thiamine mononitrate, riboflavin, folic acid.6
 
0.1%
Soybean oil.6
 
0.1%
Carbonated water, natural flavor.5
 
0.1%
Spinach.5
 
0.1%
Other values (8848)9050
90.5%
(Missing)885
 
8.8%
2021-03-23T08:35:02.508950image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
de13304
 
4.8%
7962
 
2.9%
salt4571
 
1.6%
sugar3690
 
1.3%
oil3313
 
1.2%
acid3231
 
1.2%
water3135
 
1.1%
flour2801
 
1.0%
and2612
 
0.9%
organic2544
 
0.9%
Other values (16648)230087
83.0%

Most occurring characters

ValueCountFrequency (%)
268919
 
13.7%
e168547
 
8.6%
a141767
 
7.2%
r116227
 
5.9%
i112668
 
5.7%
o105970
 
5.4%
t98060
 
5.0%
s90269
 
4.6%
,89685
 
4.6%
n87755
 
4.5%
Other values (265)686301
34.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1471189
74.8%
Space Separator268935
 
13.7%
Other Punctuation121601
 
6.2%
Uppercase Letter37029
 
1.9%
Decimal Number25346
 
1.3%
Open Punctuation15680
 
0.8%
Close Punctuation14777
 
0.8%
Connector Punctuation7774
 
0.4%
Dash Punctuation3361
 
0.2%
Math Symbol198
 
< 0.1%
Other values (10)278
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
e168547
11.5%
a141767
 
9.6%
r116227
 
7.9%
i112668
 
7.7%
o105970
 
7.2%
t98060
 
6.7%
s90269
 
6.1%
n87755
 
6.0%
l83430
 
5.7%
c77494
 
5.3%
Other values (119)389002
26.4%
ValueCountFrequency (%)
E4810
 
13.0%
S3412
 
9.2%
C2786
 
7.5%
A2397
 
6.5%
P2267
 
6.1%
I2103
 
5.7%
O1862
 
5.0%
T1706
 
4.6%
R1629
 
4.4%
F1446
 
3.9%
Other values (40)12611
34.1%
ValueCountFrequency (%)
5
 
8.5%
5
 
8.5%
5
 
8.5%
4
 
6.8%
4
 
6.8%
4
 
6.8%
4
 
6.8%
3
 
5.1%
3
 
5.1%
3
 
5.1%
Other values (14)19
32.2%
ValueCountFrequency (%)
,89685
73.8%
.9858
 
8.1%
%6701
 
5.5%
:5935
 
4.9%
*2696
 
2.2%
'2568
 
2.1%
/1516
 
1.2%
;980
 
0.8%
&718
 
0.6%
#374
 
0.3%
Other values (10)570
 
0.5%
ValueCountFrequency (%)
04394
17.3%
14317
17.0%
23701
14.6%
52837
11.2%
32439
9.6%
42423
9.6%
61888
7.4%
71280
 
5.1%
81141
 
4.5%
9926
 
3.7%
ValueCountFrequency (%)
3
18.8%
3
18.8%
2
12.5%
2
12.5%
2
12.5%
2
12.5%
1
 
6.2%
1
 
6.2%
ValueCountFrequency (%)
+152
76.8%
=33
 
16.7%
|5
 
2.5%
±5
 
2.5%
<3
 
1.5%
ValueCountFrequency (%)
(14115
90.0%
[1407
 
9.0%
{151
 
1.0%
7
 
< 0.1%
ValueCountFrequency (%)
)13389
90.6%
]1249
 
8.5%
}139
 
0.9%
ValueCountFrequency (%)
104
95.4%
»4
 
3.7%
1
 
0.9%
ValueCountFrequency (%)
17
41.5%
16
39.0%
«8
19.5%
ValueCountFrequency (%)
$21
56.8%
15
40.5%
¤1
 
2.7%
ValueCountFrequency (%)
268919
> 99.9%
 16
 
< 0.1%
ValueCountFrequency (%)
-3328
99.0%
33
 
1.0%
ValueCountFrequency (%)
¹2
66.7%
1
33.3%
ValueCountFrequency (%)
’2
66.7%
–1
33.3%
ValueCountFrequency (%)
°4
50.0%
®4
50.0%
ValueCountFrequency (%)
_7774
100.0%
ValueCountFrequency (%)
`1
100.0%
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1505009
76.5%
Common457875
 
23.3%
Cyrillic2961
 
0.2%
Greek248
 
< 0.1%
Thai75
 
< 0.1%

Most frequent character per script

ValueCountFrequency (%)
e168547
 
11.2%
a141767
 
9.4%
r116227
 
7.7%
i112668
 
7.5%
o105970
 
7.0%
t98060
 
6.5%
s90269
 
6.0%
n87755
 
5.8%
l83430
 
5.5%
c77494
 
5.1%
Other values (103)422822
28.1%
ValueCountFrequency (%)
268919
58.7%
,89685
 
19.6%
(14115
 
3.1%
)13389
 
2.9%
.9858
 
2.2%
_7774
 
1.7%
%6701
 
1.5%
:5935
 
1.3%
04394
 
1.0%
14317
 
0.9%
Other values (54)32788
 
7.2%
ValueCountFrequency (%)
о351
 
11.9%
а312
 
10.5%
н215
 
7.3%
е201
 
6.8%
и188
 
6.3%
л178
 
6.0%
р170
 
5.7%
т148
 
5.0%
с145
 
4.9%
к137
 
4.6%
Other values (24)916
30.9%
ValueCountFrequency (%)
α24
 
9.7%
τ23
 
9.3%
ο21
 
8.5%
ι19
 
7.7%
κ17
 
6.9%
ρ13
 
5.2%
ά12
 
4.8%
λ12
 
4.8%
υ11
 
4.4%
σ9
 
3.6%
Other values (22)87
35.1%
ValueCountFrequency (%)
5
 
6.7%
5
 
6.7%
5
 
6.7%
4
 
5.3%
4
 
5.3%
4
 
5.3%
4
 
5.3%
3
 
4.0%
3
 
4.0%
3
 
4.0%
Other values (22)35
46.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII1941759
98.8%
None20955
 
1.1%
Cyrillic2961
 
0.2%
Punctuation368
 
< 0.1%
Thai75
 
< 0.1%
Alphabetic PF35
 
< 0.1%
Currency Symbols15
 
< 0.1%

Most frequent character per block

ValueCountFrequency (%)
268919
13.8%
e168547
 
8.7%
a141767
 
7.3%
r116227
 
6.0%
i112668
 
5.8%
o105970
 
5.5%
t98060
 
5.1%
s90269
 
4.6%
,89685
 
4.6%
n87755
 
4.5%
Other values (82)661892
34.1%
ValueCountFrequency (%)
é13489
64.4%
ô1573
 
7.5%
è1037
 
4.9%
à932
 
4.4%
â599
 
2.9%
ï472
 
2.3%
œ334
 
1.6%
ü282
 
1.3%
É273
 
1.3%
ä219
 
1.0%
Other values (94)1745
 
8.3%
ValueCountFrequency (%)
174
47.3%
104
28.3%
33
 
9.0%
17
 
4.6%
16
 
4.3%
14
 
3.8%
7
 
1.9%
1
 
0.3%
1
 
0.3%
1
 
0.3%
ValueCountFrequency (%)
27
77.1%
8
 
22.9%
ValueCountFrequency (%)
о351
 
11.9%
а312
 
10.5%
н215
 
7.3%
е201
 
6.8%
и188
 
6.3%
л178
 
6.0%
р170
 
5.7%
т148
 
5.0%
с145
 
4.9%
к137
 
4.6%
Other values (24)916
30.9%
ValueCountFrequency (%)
15
100.0%
ValueCountFrequency (%)
5
 
6.7%
5
 
6.7%
5
 
6.7%
4
 
5.3%
4
 
5.3%
4
 
5.3%
4
 
5.3%
3
 
4.0%
3
 
4.0%
3
 
4.0%
Other values (22)35
46.7%

nutrition_grade_fr
Categorical

MISSING

Distinct5
Distinct (%)0.1%
Missing1416
Missing (%)14.2%
Memory size78.2 KiB
d
2455 
c
1779 
e
1676 
a
1393 
b
1281 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters8584
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowd
2nd rowc
3rd rowc
4th rowd
5th rowe
ValueCountFrequency (%)
d2455
24.6%
c1779
17.8%
e1676
16.8%
a1393
13.9%
b1281
12.8%
(Missing)1416
14.2%
2021-03-23T08:35:03.092030image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-03-23T08:35:03.239192image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
d2455
28.6%
c1779
20.7%
e1676
19.5%
a1393
16.2%
b1281
14.9%

Most occurring characters

ValueCountFrequency (%)
d2455
28.6%
c1779
20.7%
e1676
19.5%
a1393
16.2%
b1281
14.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter8584
100.0%

Most frequent character per category

ValueCountFrequency (%)
d2455
28.6%
c1779
20.7%
e1676
19.5%
a1393
16.2%
b1281
14.9%

Most occurring scripts

ValueCountFrequency (%)
Latin8584
100.0%

Most frequent character per script

ValueCountFrequency (%)
d2455
28.6%
c1779
20.7%
e1676
19.5%
a1393
16.2%
b1281
14.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII8584
100.0%

Most frequent character per block

ValueCountFrequency (%)
d2455
28.6%
c1779
20.7%
e1676
19.5%
a1393
16.2%
b1281
14.9%

energy_100g
Real number (ℝ≥0)

ZEROS

Distinct1780
Distinct (%)17.9%
Missing52
Missing (%)0.5%
Infinite0
Infinite (%)0.0%
Mean1128.245091
Minimum0
Maximum3768
Zeros308
Zeros (%)3.1%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2021-03-23T08:35:03.400180image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile71
Q1389
median1105
Q31674
95-th percentile2389
Maximum3768
Range3768
Interquartile range (IQR)1285

Descriptive statistics

Standard deviation792.5466442
Coefficient of variation (CV)0.7024596431
Kurtosis-0.403757058
Mean1128.245091
Median Absolute Deviation (MAD)657
Skewness0.4355420009
Sum11223782.17
Variance628130.1833
MonotocityNot monotonic
2021-03-23T08:35:03.610623image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0308
 
3.1%
2092163
 
1.6%
1674159
 
1.6%
1494130
 
1.3%
1393120
 
1.2%
1046113
 
1.1%
1644111
 
1.1%
156990
 
0.9%
119789
 
0.9%
83778
 
0.8%
Other values (1770)8587
85.9%
ValueCountFrequency (%)
0308
3.1%
0.421
 
< 0.1%
13
 
< 0.1%
21
 
< 0.1%
32
 
< 0.1%
ValueCountFrequency (%)
37681
 
< 0.1%
376615
0.1%
37613
 
< 0.1%
37491
 
< 0.1%
37071
 
< 0.1%

fat_100g
Real number (ℝ≥0)

MISSING
ZEROS

Distinct1291
Distinct (%)13.7%
Missing560
Missing (%)5.6%
Infinite0
Infinite (%)0.0%
Mean12.83285781
Minimum0
Maximum100
Zeros2285
Zeros (%)22.9%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2021-03-23T08:35:03.839950image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10.1
median5.36
Q320
95-th percentile46.43
Maximum100
Range100
Interquartile range (IQR)19.9

Descriptive statistics

Standard deviation17.41943943
Coefficient of variation (CV)1.357409214
Kurtosis6.229673009
Mean12.83285781
Median Absolute Deviation (MAD)5.36
Skewness2.197307383
Sum121142.1777
Variance303.4368701
MonotocityNot monotonic
2021-03-23T08:35:04.055691image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02285
 
22.9%
0.5171
 
1.7%
0.1132
 
1.3%
25126
 
1.3%
20105
 
1.1%
32.1497
 
1.0%
28.5795
 
0.9%
1084
 
0.8%
3084
 
0.8%
1.7977
 
0.8%
Other values (1281)6184
61.8%
(Missing)560
 
5.6%
ValueCountFrequency (%)
02285
22.9%
0.0071
 
< 0.1%
0.014
 
< 0.1%
0.022
 
< 0.1%
0.02371
 
< 0.1%
ValueCountFrequency (%)
10052
0.5%
99.71
 
< 0.1%
99.41
 
< 0.1%
991
 
< 0.1%
93.3339
0.4%

saturated-fat_100g
Real number (ℝ≥0)

MISSING
ZEROS

Distinct872
Distinct (%)9.8%
Missing1095
Missing (%)10.9%
Infinite0
Infinite (%)0.0%
Mean4.957228433
Minimum0
Maximum100
Zeros2505
Zeros (%)25.1%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2021-03-23T08:35:04.264293image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1.79
Q37.14
95-th percentile19.34
Maximum100
Range100
Interquartile range (IQR)7.14

Descriptive statistics

Standard deviation7.342593803
Coefficient of variation (CV)1.481189318
Kurtosis17.41727397
Mean4.957228433
Median Absolute Deviation (MAD)1.79
Skewness2.994774817
Sum44144.1192
Variance53.91368376
MonotocityNot monotonic
2021-03-23T08:35:04.465767image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02505
25.1%
0.1288
 
2.9%
0.5150
 
1.5%
3.57134
 
1.3%
0.2117
 
1.2%
1115
 
1.1%
0.3106
 
1.1%
7.1499
 
1.0%
0.493
 
0.9%
1077
 
0.8%
Other values (862)5221
52.2%
(Missing)1095
 
10.9%
ValueCountFrequency (%)
02505
25.1%
0.00011
 
< 0.1%
0.0013
 
< 0.1%
0.0031
 
< 0.1%
0.00521
 
< 0.1%
ValueCountFrequency (%)
1001
< 0.1%
92.11
< 0.1%
86.672
< 0.1%
791
< 0.1%
64.291
< 0.1%

carbohydrates_100g
Real number (ℝ≥0)

MISSING
ZEROS

Distinct1928
Distinct (%)20.4%
Missing567
Missing (%)5.7%
Infinite0
Infinite (%)0.0%
Mean32.03619103
Minimum0
Maximum164
Zeros733
Zeros (%)7.3%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2021-03-23T08:35:04.660347image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q15.7
median20.85
Q358.82
95-th percentile81.34
Maximum164
Range164
Interquartile range (IQR)53.12

Descriptive statistics

Standard deviation29.11406654
Coefficient of variation (CV)0.9087867691
Kurtosis-1.06454764
Mean32.03619103
Median Absolute Deviation (MAD)19.85
Skewness0.5516410967
Sum302197.39
Variance847.6288706
MonotocityNot monotonic
2021-03-23T08:35:04.846783image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0733
 
7.3%
3.57150
 
1.5%
6.67110
 
1.1%
0.5108
 
1.1%
50107
 
1.1%
7586
 
0.9%
185
 
0.9%
10084
 
0.8%
6081
 
0.8%
8078
 
0.8%
Other values (1918)7811
78.1%
(Missing)567
 
5.7%
ValueCountFrequency (%)
0733
7.3%
0.021
 
< 0.1%
0.041
 
< 0.1%
0.124
 
0.2%
0.142
 
< 0.1%
ValueCountFrequency (%)
1641
 
< 0.1%
10084
0.8%
99.51
 
< 0.1%
99.21
 
< 0.1%
992
 
< 0.1%

sugars_100g
Real number (ℝ≥0)

MISSING
ZEROS

Distinct1488
Distinct (%)15.8%
Missing583
Missing (%)5.8%
Infinite0
Infinite (%)0.0%
Mean15.55183244
Minimum0
Maximum100
Zeros1342
Zeros (%)13.4%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2021-03-23T08:35:05.033317image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11.2
median5.29
Q322.88
95-th percentile61.244
Maximum100
Range100
Interquartile range (IQR)21.68

Descriptive statistics

Standard deviation20.9520734
Coefficient of variation (CV)1.347241457
Kurtosis2.52318645
Mean15.55183244
Median Absolute Deviation (MAD)5.29
Skewness1.736750624
Sum146451.6061
Variance438.9893797
MonotocityNot monotonic
2021-03-23T08:35:05.217965image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01342
 
13.4%
3.57261
 
2.6%
0.5228
 
2.3%
3.33143
 
1.4%
1123
 
1.2%
2080
 
0.8%
0.779
 
0.8%
1078
 
0.8%
0.877
 
0.8%
275
 
0.8%
Other values (1478)6931
69.3%
(Missing)583
 
5.8%
ValueCountFrequency (%)
01342
13.4%
0.00011
 
< 0.1%
0.0012
 
< 0.1%
0.013
 
< 0.1%
0.022
 
< 0.1%
ValueCountFrequency (%)
10032
0.3%
99.52
 
< 0.1%
992
 
< 0.1%
98.821
 
< 0.1%
97.61
 
< 0.1%

fiber_100g
Real number (ℝ)

MISSING
ZEROS

Distinct275
Distinct (%)3.7%
Missing2551
Missing (%)25.5%
Infinite0
Infinite (%)0.0%
Mean2.90515049
Minimum-6.7
Maximum99
Zeros2487
Zeros (%)24.9%
Negative1
Negative (%)< 0.1%
Memory size78.2 KiB
2021-03-23T08:35:05.539825image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-6.7
5-th percentile0
Q10
median1.6
Q33.6
95-th percentile10
Maximum99
Range105.7
Interquartile range (IQR)3.6

Descriptive statistics

Standard deviation4.805121779
Coefficient of variation (CV)1.654000987
Kurtosis67.29296888
Mean2.90515049
Median Absolute Deviation (MAD)1.6
Skewness5.879151637
Sum21640.466
Variance23.08919531
MonotocityNot monotonic
2021-03-23T08:35:05.728109image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02487
24.9%
3.6322
 
3.2%
1.8153
 
1.5%
0.8151
 
1.5%
2147
 
1.5%
3.3140
 
1.4%
7.1139
 
1.4%
0.5134
 
1.3%
6.7127
 
1.3%
2.4125
 
1.2%
Other values (265)3524
35.2%
(Missing)2551
25.5%
ValueCountFrequency (%)
-6.71
 
< 0.1%
02487
24.9%
0.011
 
< 0.1%
0.071
 
< 0.1%
0.140
 
0.4%
ValueCountFrequency (%)
991
< 0.1%
86.21
< 0.1%
77.81
< 0.1%
72.51
< 0.1%
66.71
< 0.1%

proteins_100g
Real number (ℝ≥0)

ZEROS

Distinct933
Distinct (%)9.4%
Missing91
Missing (%)0.9%
Infinite0
Infinite (%)0.0%
Mean7.13126703
Minimum0
Maximum86.36
Zeros1895
Zeros (%)18.9%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2021-03-23T08:35:05.929690image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10.8
median5
Q310
95-th percentile23.4
Maximum86.36
Range86.36
Interquartile range (IQR)9.2

Descriptive statistics

Standard deviation7.985387317
Coefficient of variation (CV)1.119771183
Kurtosis6.949528802
Mean7.13126703
Median Absolute Deviation (MAD)4.5
Skewness1.96964073
Sum70663.725
Variance63.7664106
MonotocityNot monotonic
2021-03-23T08:35:06.140504image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01895
 
18.9%
7.14195
 
1.9%
0.5164
 
1.6%
10152
 
1.5%
6.67132
 
1.3%
3.57130
 
1.3%
25128
 
1.3%
3.33128
 
1.3%
5126
 
1.3%
12.596
 
1.0%
Other values (923)6763
67.6%
ValueCountFrequency (%)
01895
18.9%
0.015
 
0.1%
0.022
 
< 0.1%
0.071
 
< 0.1%
0.159
 
0.6%
ValueCountFrequency (%)
86.361
< 0.1%
81.31
< 0.1%
801
< 0.1%
77.271
< 0.1%
73.31
< 0.1%

salt_100g
Real number (ℝ≥0)

MISSING
ZEROS

Distinct1619
Distinct (%)16.6%
Missing228
Missing (%)2.3%
Infinite0
Infinite (%)0.0%
Mean1.595388073
Minimum0
Maximum124.46
Zeros1282
Zeros (%)12.8%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2021-03-23T08:35:06.342000image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10.06099
median0.58
Q31.37
95-th percentile3.90144
Maximum124.46
Range124.46
Interquartile range (IQR)1.30901

Descriptive statistics

Standard deviation6.608683218
Coefficient of variation (CV)4.142367195
Kurtosis153.6917019
Mean1.595388073
Median Absolute Deviation (MAD)0.5546
Skewness11.62212638
Sum15590.13225
Variance43.67469388
MonotocityNot monotonic
2021-03-23T08:35:06.558365image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01282
 
12.8%
0.01183
 
1.8%
0.1155
 
1.6%
1101
 
1.0%
0.0382
 
0.8%
0.025473
 
0.7%
1.170
 
0.7%
1.369
 
0.7%
1.568
 
0.7%
1.867
 
0.7%
Other values (1609)7622
76.2%
(Missing)228
 
2.3%
ValueCountFrequency (%)
01282
12.8%
5.2 × 1051
 
< 0.1%
0.00011
 
< 0.1%
0.000251
 
< 0.1%
0.000331
 
< 0.1%
ValueCountFrequency (%)
124.461
 
< 0.1%
105.834182
< 0.1%
100.01251
 
< 0.1%
1003
< 0.1%
99.905824
< 0.1%

Interactions

2021-03-23T08:34:44.764338image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:44.936770image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:45.099975image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:45.254599image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:45.410858image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:45.572088image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:45.729311image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:45.883723image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:46.040431image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:46.199495image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:46.361531image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:46.525064image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:46.686169image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:46.845690image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:47.007386image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:47.171757image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:47.331744image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:47.490190image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:47.651325image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:47.811962image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:47.974209image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:48.258276image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:48.422154image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:48.585676image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:48.743375image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:48.905917image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:49.063971image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:49.223994image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:49.378894image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:49.535145image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:49.693538image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:49.849881image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:50.006633image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:50.161724image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:50.315544image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:50.466645image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:50.623559image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:50.781543image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:50.940706image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:51.099376image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:51.254609image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:51.411511image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:51.564053image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:51.714201image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:51.863836image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:52.019418image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:52.308744image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:52.475893image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:52.636279image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:52.791765image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:52.947966image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:53.103981image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:53.258428image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:53.411774image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:53.568160image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:53.725471image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:53.885577image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:54.042134image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:54.195508image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:54.349147image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:54.505433image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:54.661596image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:54.813728image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:54.969648image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:55.133758image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:55.291647image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:55.449103image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:55.602820image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:55.769463image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:55.923211image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:56.077080image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:56.362677image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:56.526580image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:56.681821image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:56.839554image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:56.995334image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:57.147829image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:57.300299image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:57.456071image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:57.609469image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:57.759323image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:57.909773image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:58.067504image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:58.228129image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:58.386894image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:58.542710image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:58.700642image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:58.857329image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:59.012831image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-03-23T08:34:59.169971image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-03-23T08:35:06.739557image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-03-23T08:35:06.936019image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-03-23T08:35:07.139739image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-03-23T08:35:07.340686image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-03-23T08:34:59.449767image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-03-23T08:34:59.772385image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-03-23T08:35:00.226640image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-03-23T08:35:00.461155image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

df_indexUnnamed: 0product_namebrandsingredients_textnutrition_grade_frenergy_100gfat_100gsaturated-fat_100gcarbohydrates_100gsugars_100gfiber_100gproteins_100gsalt_100g
09237095036Premium Ice CreamCasper's Ice Cream Inc.Fresh whole milk, sugar, cream, mint flake (corn syrup, sugar, partially hydrogenated soybean oil, red lake 40, green 3, red 40, peppermint oil, soy lecithin), corn syrup solids, non-fat dry milk, stabilizer (microcrystalline cellulose, cellulose gum, mond908.010.145.8028.9920.290.02.900.16510
1145467150904Kohler, Strawberry Balsamic Rare Facets ChocolateKohler Original Recipe Chocolates61% chocolate (cocoa beans, pure cane sugar, cocoa butter, soy lecithin, vanilla bean), milk chocolate (pure cane sugar, full cream milk, cocoa butter, cocoa beans, soy lecithin, vanilla bean), strawberry puree (strawberry 87%, invert sugar syrup), 55% chc314.05.002.5010.0010.000.00.000.00000
2100452103365Center Cut Chops Boneless Thin PorkTarget StoresPork, *solution ingredients: water, potassium lactate, sodium phosphates, salt, diacetate.NaN640.07.062.350.00NaNNaN18.820.71628
3172539179915Corn Flakes Glacés au sucreCrownfieldMaïs 77% Sucre 28% Extrait de Malte d'orge Selc1597.00.300.1086.1029.603.25.700.73000
45659557919Homekist, Sandwich Cookies, Lemon CremeVista Bakery Inc.Enriched flour (wheat flour, niacin, reduced iron, thiamine mononitrate, riboflavin, folic acid), sugar, vegetable oil (contains one or more of the following: canola oil, corn oil, palm oil, soybean oil), dextrose, high fructose corn syrup, corn syrup, cod1975.019.445.5672.2230.560.05.560.67056
5172980180553Biscino Chocolat NoirSondeyNaNe2054.023.3014.8060.5030.90NaN6.900.38000
65349154715Turkey FranksJennie-O, Jennie-O Turkey Store Inc.Mechanically separated turkey, water, salt, contains 2% or less modified food starch, potassium lactate, potassium acetate, sodium diacetate, seasoning (corn syrup solids, dextrose, sugar, paprika, sodium erythorbate, spice extractives), natural smoke flad895.017.864.461.790.000.012.502.90322
71403814545Country Blend Mixed VegetablesBig YWater, carrots, potatoes, celery, sweet peas, green beans, corn, lima beans, salt, calcium chloride (to maintain firmness), onion flavoring.b134.00.000.006.402.400.80.800.58928
8179148190504Marc GuiselinAldiNaNe3048.082.0056.400.200.200.00.600.25400
95324054460Juice Cocktail From Concentrate, CranberryBest Yet, S. M. Flickinger Co. Inc.Filtered water, high fructose corn syrup, cranberry juice concentrate, ascorbic acid (vitamin c).NaN243.00.00NaN14.1714.17NaN0.000.03810

Last rows

df_indexUnnamed: 0product_namebrandsingredients_textnutrition_grade_frenergy_100gfat_100gsaturated-fat_100gcarbohydrates_100gsugars_100gfiber_100gproteins_100gsalt_100g
9990226067250928Mûre Framboise CranberryLes 4 SaisonsMélange de fruits (Mûre, Framboise et Cranberry), sucred1016.00.400.0060.0059.00NaN0.800.00000
999185448767Chocolate Peanut ClustersSweet Smiles, New England Confectionery Company Inc.Peanuts, milk chocolate [sugar, cocoa butter, chocolate liquor, milk, soy lecithin, vanillin (artificial flavor)].e2987.050.0014.2959.5245.244.8019.050.21082
99921638916949Double"Q", Wild Alaskan Skinless & Boneless Pink SalmonNaNPink salmon, salt.a464.02.380.000.000.000.0019.050.88646
9993235161262145Mogettes cuisinées à base de Mogette de Vendée Label RougeNos régions ont du talentEau, Mogette de Vendée Label Rouge sèche trempée (38%), carottes (7,6%), lardons cuits fumés rissolés (poitrine de porc, sel) (3%), sel, fond de porc (arômes naturels, graisse et extrait de porc, eau, sel, sucre), aromates (ail, thym, laurier en poudre).a280.00.900.207.300.904.005.300.94800
99942027020897Petite Cut Diced Tomatoes With Zesty JalapenosDel MonteTomatoes, tomato juice, jalapeno peppers, contains less than 1% of the following: salt, dehydrated onions, distilled vinegar, citric acid, spices, garlic powder, calcium chloride, natural flavor, onion powderb100.00.000.004.762.380.800.790.74676
9995287019339514Tortas De Maiz Gullón S / Gluten 130GRMILHO CORNNaNa109.0NaN0.10NaN0.601.908.001.30000
999622123227952% Milk Fat Reduced Fat MilkDarigoldReduced fat milk, dha algal oil‡, vitamin a palmitate, vitamin d3.b226.02.081.255.425.000.003.330.13716
9997247194278537Apéro Tapas Aux Ténébrions Et Aux Grillons Bio & Naturellement Sans GlutenJimini sINGRÉDIENTS ténébrions&quot; entiers déshydratés (ténébrion Molitor ) grillons entier déshydratés oignon , betterave rouge dextrose ail poivre noir thym persil origan basilic ciboulette huile de tournesold2033.027.816.925.800.706.2153.074.68000
99989561798326Thin Wheat Baked Snack CrackersGreat Value, Wal-Mart Stores Inc.Whole grain wheat flour, sugar, soybean oil, salt, cornstarch, malt syrup (from corn and barley), invert sugar, vegetable color (annatto extract and turmeric oleoresin), baking soda.c1732.010.340.0075.8617.2410.306.901.57734
9999123798128798Poker Wafer, Vanilla CreamGastone Lago EllediWheat flour, sugar, vegetable oils (palm, coconut), whey powder, soy lecithin, salt, sodium hydrogen carbonate, vanilla extract.e2230.026.6720.0066.6743.330.003.330.38100